PepTiger: Search Engine for Error-Tolerant Protein Identification from de Novo Sequences

نویسندگان

  • Irina Fedulova
  • Zheng Ouyang
  • Charles Buck
  • Xiang Zhang
چکیده

In recent years a number of de novo sequencing software products became available providing possible partial or complete amino acid sequence tags for MS/MS spectra of peptides. However, for a variety of reasons including spectral chemical noise and imperfect fragmentation these sequence tags almost always contain errors. Additional difficulties arise from actual protein sequence variation and post-translational modifications. We present a search engine named PepTiger which is capable of correctly matching de novo sequence tags with errors to protein sequences in a protein database. The algorithm is based on approximate string matching followed by a novel scoring procedure which takes into account mass differences and the string distance between de novo sequence and matched peptides and similarities between theoretical and experimental MS/MS spectra. Comparison of PepTiger with other protein identification software shows that PepTiger is better able to assign de novo sequence tags with errors to the correct peptide sequences.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Defining parameters for homology-tolerant database searching.

De novo interpretation of tandem mass spectrometry (MS/MS) spectra provides sequences for searching protein databases when limited sequence information is present in the database. Our objective was to define a strategy for this type of homology-tolerant database search. Homology searches, using MS-Homology software, were conducted with 20, 10, or 5 of the most abundant peptides from 9 proteins,...

متن کامل

Overcoming Species Boundaries in Peptide Identification with BICEPS

Currently, the reliable identification of peptides and proteins is only feasible when thoroughly annotated sequence databases are available. While sequencing capacities continue to grow, many organisms remain without reliable, fully annotated reference genomes required for proteomic analyses. Standard database search algorithms fail to identify peptides which are not exactly contained in a prot...

متن کامل

MultiTag: multiple error-tolerant sequence tag search for the sequence-similarity identification of proteins by mass spectrometry.

The characterization of proteomes by mass spectrometry is largely limited to organisms with sequenced genomes. To identify proteins from organisms with unsequenced genomes, database sequences from related species must be employed for sequence-similarity protein identifications. Peptide sequence tags (Mann, 1994) have been used successfully for the identification of proteins in sequence database...

متن کامل

Clustering of Short Read Sequences for de novo Transcriptome Assembly

Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with d...

متن کامل

Implementation and uses of automated de novo peptide sequencing by tandem mass spectrometry.

There are several computer programs that can match peptide tandem mass spectrometry data to their exactly corresponding database sequences, and in most protein identification projects, these programs are utilized in the early stages of data interpretation. However, situations frequently arise where tandem mass spectral data cannot be correlated with any database sequences. In these cases, the u...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007